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Abstract — We address the problem of controlling a stochastic 
version of a Dubins vehicle such that the probability of 
satisfying a temporal logic specification over a set of properties 
at the regions in a partitioned environment is maximized. We 
assume that the vehicle can determine its precise initial position 
in a known map of the environment. However, inspired by 
practical Hmitations, we assume that the vehicle is equipped 
with noisy actuators and, during its motion in the environment, 
it can only measure its angular velocity using a limited 
accuracy gyroscope. Through quantization and discretization, 
we construct a finite approximation for the motion of the vehicle 
in the form of a Markov Decision Process (MDP). We allow for 
task specifications given as temporal logic statements over the 
environmental properties, and use tools in Probabilistic Com- 
putation Tree Logic (PCTL) to generate an MDP control policy 
that maximizes the probability of satisfaction. We translate this 
policy to a vehicle feedback control strategy and show that 
the probability that the vehicle satisfies the specification in the 
original environment is bounded from below by the maximum 
probability of satisfying the specification on the MDP. 

I. Introduction 

In "classical" motion planing problems [LaV06], the spec- 
ifications are usually restricted to simple primitives of the 
type "go from A io B and avoid obstacles", where A and B 
are two regions of interest in some environment. Often this is 
not rich enough to describe a task of interest in practical ap- 
plications. Recently, it has been shown that temporal logics, 
such as Linear Temporal Logic (LTL) and Computational 
Tree Logic (CTL), can serve as rich languages capable of 
specifying complex motion missions such as "go to region 
A and avoid region B unless regions C or D are visited" (see, 
for example, [KF08], [KB08b], [KGFP07], [BKVIO]). 

In order to use tools from formal verification and automata 
games [BKL08] for motion planning and control, most of 
the works using temporal logics as specification languages 
assume that the motion of the vehicle in the environment can 
be modeled as a finite system. This usually takes the form of 
a finite transition system [CGP99] that is either deterministic 
(applying an available action triggers a unique transition 
[KBOSb]) or nondeterministic (applying an available action 
can enable multiple transitions, with no information on 
their likelihoods [KB08a]). More recent results show that, 
if sensor and actuator noise models can be obtained through 
experimental trials, then the robot motion can be modeled 
as a Markov Decision Process (MDP), and probabilistic 
temporal logics, such as Probabilistic CTL (PCTL) and 
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Probabilistic LTL (PLTL), can be used for motion planning 
and control (see [LWABIO]). 

However, robot dynamics are normally described by con- 
trol systems with state and control variables evaluated over 
infinite domains. A widely used approach for temporal 
logic verification and control of such a system is through 
the construction of a finite abstraction [TP06], [Gir07], 
[YTC^12]). Even though recent works discuss the construc- 
tion of abstractions for stochastic systems [JP09], [ADBS08], 
[DAB SOS], the existing methods are either not applicable to 
robot dynamics or are computationally infeasible given the 
size of the problem in most robotic applications. 

In this paper, we provide a conservative solution to the 
problem of controlling a stochastic Dubins vehicle such that 
the probability of satisfying a temporal logic specification 
over a set of properties at the regions in a partitioned 
environment is maximized. Inspired by a realistic scenario 
of an indoor vehicle leaving its charging station, we assume 
that the vehicle can determine its precise initial position 
in a known map of the environment. The actuator noise is 
modeled as a random variable with an arbitrary continuous 
probabifity distribution supported on a bounded interval. 
Also, we assume that the vehicle is equipped with a limited 
accuracy gyroscope, which measures its angular velocity, as 
the only means of measurement available. 

By discretization and quantization, we capture the motion 
of the vehicle as well as the position uncertainty as a finite 
state MDP. In this setup, the vehicle control problem is 
converted to the problem of finding a control policy for an 
MDP such that the probability of satisfying a PCTL formula 
is maximized. For the latter, we use the approach from 
[LWABIO]. By establishing a mapping between the states of 
the MDP and the sequences of measurements obtained from 
the gyroscope, we show that a policy for the MDP becomes 
equivalent to a feedback control strategy for the vehicle in the 
environment. Finally, we show that the probability that the 
vehicle satisfies the specification in the original environment 
is bounded from below by the maximum probability of 
satisfying the specification on the MDP. 

The main contribution of this work lies in the application, 
since the result holds for a realistic vehicle with noisy 
actuators and a limited accuracy gyroscope. The method that 
we propose here is closely related to "classical" Dynamic 
Programming (DP) - based approaches [ASG07]. In these 
problems, the set of allowed specifications is restricted to 
reaching a given destination state, whereas our PCTL control 
framework allows for richer, temporal logic specifications 
and multiple destinations. In [SWT09], the authors solve 
the problem of reaching a given destination while avoiding 



obstacles using the Rapidly-exploring Random Tree (RRT) 
algorithm that takes into account local reachability, as defined 
by differential constraints. In our approach, in order to obtain 
an MDP, a tree is also constructed (see Sec. |IV-B| ), but we 
take into account reachability under uncertainty. Moreover, 
our approach produces a feedback control strategy and a 
lower bound on the probability of satisfaction, whereas the 
method presented in [SWT09] only returns a collision-free 
trajectory. In addition, these methods differ from our work 
since they require precise state of the vehicle, at all times, 
whereas in our case, it is always uncertain. 

The remainder of the paper is organized as follows. In 
Sec. |ll| we introduce the necessary notation and review some 
preliminary results. We formulate the problem and outline 



the approach in Sec. Ill The discretization and quantization 
processes leading to the construction of the MDP model are 
described in Sees. 




\V\ and VI The vehicle control policy 
is obtained in Sec. |VII| Cas e studies illustrating our approach 
are presented in Sec. |VIIl| We conclude with final remarks 



and directions for future work in Sec. |IXl 

II. Preliminaries 

In this section, we provide a short and informal introduc- 
tion to Markov Decision Processes (MDP) and Probabilistic 
Computation Tree Logic (PCTL). For details, the reader is 
referred to [BKL08]. 

Definition 1 (MDP): A labeled MDP M is a tuple 
(S^so^Act^A^P^U^h), where 5" is a finite set of states; ^ ^ 
is the initial state; Act is a finite set of actions; A .S ^ 2^^^ is 
a function specifying the enabled actions at a state s; P \ S x 
Act X 5" ^ [0, 1] is a transition probability function such that 
for all states s e S and actions a eA{s): Y,s'esP{^j^j^^) — 1' 
and for all actions a^A{s) and s' G P{s^ s') = 0; n is the 
set of propositions; and h \ S is 3. function that assigns 
some propositions in n to each state of s e S. 

A path CO through an MDP is a sequence of states CO = 
sqSi . . . SiSi^i . . ., where each transition is induced by a choice 
of an action at the current step /. We denote the set of all finite 
paths by Path^^" (the MDP will be clear from the context). 

Definition 2 (MDP Control Policy): A control policy /i 
of an MDP M is a function ji : Path^^" Act that specifies 
the next action to be applied after every finite path. 

Probabilistic Computational Tree Logic (PCTL) is a prob- 
abilistic extension of CTL that includes the probabilistic 
operator Formulas of PCTL are constructed by connect- 
ing propositions from a set n using Boolean operators (-■ 
(negation), A (conjunction), and (implication)), tempo- 
ral operators (Q (next), ^ (until)), and the probabilistic 
operator For example, formula ^max='}[~^^3 ^ ^4] asks 
for the maximum probability of reaching the states of an 
MDP satisfying 714, without passing through states satisfy- 
ing TTs. The more complex formula ^max=i 
^>o.5[~'^3 ^1])] asks for the maximum probability of 
eventually visiting states satisfying 7r4 and then with prob- 
ability greater than 0.5 states satisfying n\, while always 
avoiding states satisfying 7t^. Probabilistic model-checking 
tools, such as PRISM (see [KNP04]), can be used to find 



these probabilities. Simple adaptations of the model checking 
algorithms, such as the one presented in [LWABIO], can be 
used to find the corresponding control policies. 

III. Problem Formulation and Approach 

A Dubins vehicle ([Dub57]) is a unicycle with constant 
forward speed and bounded turning radius moving in a plane. 
In this paper, we consider a stochastic version of a Dubins 
vehicle, which captures actuator noise: 
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where (x,j) G and 6 G [0,27r) are the position and 
orientation of the vehicle in a world frame, u is the control 
input, U is the control constraint set, and e is a random 
variable modeling the actuator noise. We assume that £ has 
an arbitrary continuous probability distribution supported on 
the bounded interval [—Smax^^max]- The forward speed is 
normalized to 1 and p is the minimum turn radius. We denote 
the state of the system by ^ = [x, 0]^ G SE{2). 

As it will become clear later, the control strategy proposed 
in this paper works for any finite set of controls U. However, 
motivated by the fact that the optimal Dubins paths use only 
three inputs ([Dub57]), we assume 



C/ = {-l/p,0,l/p}. 



We define 



W = {u^£\ueU,£e [-£max^£max]} 

as the set of applied control inputs, i.e, the set of angular 
velocities that are applied to the system in the presence 
of noise. We assume that time is uniformly discretized 
(partitioned) into stages (intervals) of length At, where stage 
k is from (^ — 1)A^ to kAt. The duration of the motion is finite 
and it is denoted by KAt. []We denote the control input and 
the applied control input at stage k 2iS Uk ^ U and Wk ^W, 
respectively. 

We assume that the noise £ is piece-wise constant, i.e, it 
can only change at the beginning of a stage. This assumption 
is motivated by practical applications, in which a servo 
motor is used as an actuator for the turning angle (see e.g., 
[Maz04]). This implies that the applied control is also piece- 
wise constant, i.e., w : [{k — l)At,kAt] ^W, k = l,...,K, 
is constant over each stage. We assume that the vehicle is 
equipped with only one sensor, which is a limited accuracy 
gyroscope. At stage k, this returns the measured interval 
[w^,W/^] C [H'k~£maxi Uk + ^max] containing the applied control 
input. 

The vehicle moves in a planar environment that is 
partitionecj^into a set of polytopic regions R. Let Runsafe C R 
denote a set of unsafe regions, and Rsafe = ^X^unsafe define 

^ Since PCTL has infinite time semantics, after KAt the system remains 
in the state achieved at KAt. In Remark 1 (Sec. |VI-A) , we explain how to 
determine K. 

^Throughout the paper, we relax the notion of a partition by allowing 
regions to share bundaries. 



the set of safe regions. One set of regions Rp C Rsafe is 
labeled with "pick-up", and another set Rd C Rsafe is labeled 
with "drop-off. In this work, we assume that the motion 
specification is as follows: 

Specification 1: "Starting from an initial state qinu, the 
vehicle is required to reach a pick-up region to pick up a 
load. Then, the vehicle should go to a drop-off region to 
drop off the load. At all times, the vehicle should avoid the 
unsafe regions." 

Let n = {Trp^TTd^TTu} be a set of propositions, where 
TTp^TTd, and TTu label the pick-up, drop-off, and unsafe regions, 
respectively. We define [lip] = {{x^y) G ]R^|(x,j) G UreRp^}^ 
[%] = {(^.3^) ^ ^^\{^.y) ^ ^reRA and [tTu] = {{x,y) G 
M^|(x,j) G ^reRunsafe^} positions that satisfy 

propositions Tip, and TTu, respectively. 

We assume that the vehicle can precisely determine its 
initial state qMt = [xinit,yinit,Oinit]^ in a known map of the 
environment. While the vehicle moves, gyroscope measure- 
ments [w^,Wyt] are available at each stage k. We define a 
vehicle control strategy as a map that takes as input a se- 
quence of measured intervals [wi, wi] [w25^2] • • • [>l^^-i,>^^-i] 
and returns the control input uj^^U at stage k. We are ready 
to formulate the main problem that we consider in this paper: 

Problem 1: Given a partitioned environment R, a vehicle 
model described by Eqn. ([T]) with initial state qinu, a motion 
task in the form of Specification [T] find a vehicle control 
strategy that maximizes the probability of satisfying the 
specification. 

The requirement that the vehicle maximizes the probability 
of satisfying Specification [T] translates to the following PCTL 
formula: 

[-^Ttu^^ {^Ttu A Tip A ^>o [-TT^^ {^TluATld)])]- 

(2) 

To fully specify Problem [T] we need to define the satisfaction 
of a PCTL formula by a trajectory q : [O.KAt] SE{2) 
of the system from Eqn. ([T]). For G 2^, let [0] be the set 
of all positions in satisfying all and only propositions 
;r G 0. The word corresponding to a state trajectory q{t) is 3. 
sequence o = 010203 Ok^ 2^, ^ > 1, generated according 
to the following rules, for all t G [O.KAt] and ^ G N, ^ > 1: (i) 
{x{0),y{0)) e[oi]; (ii) if {x{t),y{t)) G [ok] mdok^Ok^u then 
3t^>t s.t. a) {x{t^)MO) G [om] and b) {x{T),y{T)) ^ [n], 
Vt G [t,t'], Vtt en\{okUok^i); (iii) if {x{KAt),y{KAt)) G 
[ok] then Oi = o^ V/ > k. Informally, the word produced by 
q{t) is the sequence of sets of satisfied propositions as time 
evolves. A trajectory q(t) satisfies PCTL formula if and 
only if the word generated according to the rules stated above 
satisfies the formula. 

In this paper, we develop an approximate solution to 
Problem [T] consisting of three steps. First, by discretizing the 
noise interval, we define a finite subset of the set of possible 
applied control inputs. We use this to define a Quantized 
System (QS) that approximates the original system given 
by Eqn. ([T]). Second, we capture the uncertainty in the 
position of the vehicle and map the QS to an MDP. Finally, 
we find a control policy for the MDP that maximizes the 



probability of satisfying the specification, and translate this 
policy to a vehicle control strategy. In addition, we show that 
the probability that the original system under the obtained 
control strategy satisfies is bounded from bellow by the 
obtained maximum probability. 

IV. Approximation 
A. Quantized System 

We use qk{t) and w^, t e [{k— 1)A^,M^], k= 1, . . . to 
denote the state trajectory and the constant applied control 
at stage k, respectively. With a slight abuse of notation, we 
use qk to denote the end of state trajectory qk{t), i.e., q^ = 
qk{kAt). Given a state qk-i, the state trajectory qk{t) can be 
derived by integrating the system given by Eqn. ([T]) from the 
initial state qk-i, and taking into account that the applied 
control is constant and equal to wj^. Throughout the paper, 
we will also denote this trajectory by qk{qk-i,Wk,t), when 
we want to explicitly capture the initial state q^-i and the 
constant applied control Wk. 

Motivated by practical applications, we assume that the 
measurement resolution of the gyroscope, i.e., the length of 
[w^,Wy^], is constant, and we denote it by Ae. For simplicity 
of presentation, we also assume that nAe = lEmax^ for some 
^ G Z+. However, the following approach works as long as 
Ae < 2£max- Let = -emax + (/ - 1 )Ae and e,- = -emax + ^'Ae, 
/= l,...,f7. Then, [—^maxi^max] Can be partitionecj^ into n 
intervals: [e^ , / = 1 , . . . , ^. 

For each interval we define a representative value = 
/= 1,...,^, i.e., Ei is the midpoint of interval 
We denote the set of all noise intervals and the set of their 
representative values as S = {[^i,^ 1 ],..., and E = 

respectively. At stage k, the gyroscope returns 
the measured interval [uk — £kjUk-\-£k] containing the applied 
control input, where [s^jSk] ^ ^- Note that, since eU is 
known, we can obtain obtain the noise interval [s^jSk] ^ 
containing the noise at stage k, from the measured interval. 

Recall that e is a random variable with an arbitrary 
continuous probability distribution supported on the bounded 
interval [—Smax^^max]- In this paper, we assume uniform 
distribution, but our approach is general in the sense that it 
holds for any continuous distribution supported on a bounded 
interval. For the uniform distribution the following holds: 

Pr(£ e = = = i (3) 

^^max ^^max n 

[e^-,e,-]G^, /=!,..., ^. 

We define = {w + e | u eU,£ e E} CW as a finite 
set of applied control inputs. Also, let 0) : ^ be a 
random variable, where (o{u) = u-\-£ with the probability 
mass function P(o{co{u) = u-\-£) = £ e E. This probability 
follows from Eqn. Q since £ e E is the representative value 
of interval G (o. Finally, we define a Quantized System 
(QS) that approximates the original system as follows: The 

^Throughout the paper, we relax the notion of a partition by allowing the 
endpoints of the intervals to overlap. 



set of applied control inputs in QS is Wj; for a state q^- 
and a control input Uk^U, QS returns 



(4) 



with probability ^, where £ eE. Since for u^^U the applied 

control input is + e G with probability ^ , the returned 

state trajectory at stage k is qk{qk-\ ,Uk-\-£,t) with probability 
1 

n ' 

B. Reachability graph 

We denote wiW2 . . . w^, in which each Uk^U gives a control 
input at stage k, as a finite sequence of control inputs of 
length K. We use to denote the set of all such sequences. 
For the initial state qtnit and Z^, we define the reachability 
graph GK{qinit) (see [LaV06] for a related definition), which 
encodes the set of all state trajectories originating from qinu 
that can be obtained, with a positive probability, by applying 
sequences of control inputs from according to QS given 
by Eqn. Q. In Fig. [T] we give an example of a reachability 
graph. 
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1. The projection of reachability graph G2>{qinit) in M when U = 
, 0, 1 } and -0. 1,0,0.1} with = 1 .2. Magenta objects represent 



the states of the vehicle. 

Note that, by using a gyroscope with a finer measurement 
resolution (i.e., by decreasing Ae), a more dense reachability 
graph can be obtained. In the theoretical limit, as Ae 0, 
GKiqinit) approaches the set of all trajectories, originating 
from qinit, that can be generated by the original system. 

V. Position Uncertainty 

Since the specification (see Eqn. ([2])) is a statement about 
the propositions satisfied by the regions in the partitioned 
environment, in order to answer whether some state trajec- 
tory satisfies PCTL formula 0, it is sufficient to know its 
projection in R^. Therefore, we focus only on the position 
uncertainty. 



The position uncertainty of the vehicle when its nominal 
position is (x, G M? is modeled as a disc centered at (x, 
with radius <^ G M, where ^ denotes the uncertainty: 

D{{x,yU) = {{x',y') G R^\\\{x,y),{x' ,y')\\ < (5) 

where 1 1 • 1 1 denotes the Euclidian distance. Next, we explain 
how to obtain ^. 

Any state trajectory q{t) ^ GK{qinit), t ^ [0,^A^], can 
be partitioned into K state trajectories: qk{t) = q{t'), t' ^ 
[{k-\)^t,k^t], k= 1,...,^ (see Fig. [5]). We denote the 
uncertainty at state q^ as Let u^^^k^ be the applied 
control input at stage k such that qk{t) = qk{qk-\^Uk^ £k^t)^ 
k= 1, . . . with qo = qinu. Then, we set the uncertainty at 
state qu = [xk.yk.^kV ^Q^^l to: 

^k= max {\\{xk,yk),{x\y'))\\} where 



Qk{t)=^Mj,_^^^k^£k^t) and qj^(t) = qj^{qj^_^,Uk- 



for k - 



^k^^k-i 



-^k-,t)-, 

(6) 



1 , . . . , ^, where q^ = qQ = qinit- 
Eqn. ^ is obtained using a worst case scenario assump- 
tion. If Uk-\-£k ^ is the applied control input for QS, the 
corresponding applied control input at stage k for the original 
system is in [u^ — £k^Uk-\-£k], where £k G [£kj'^k] ^ ^- The 
position of the end state of the original system at stage k 
would be the farthest (in the Euclidean sense) from q^, if 
the applied control input was either Ui-\-£i, i = 1, . . . or 
Ui-\-£i, i= 1,...,^ (see [FMAG98] for more details). An 
example is given in Fig. |2] 

From Eqn. ^ it follows that, given a state trajectory 
q{t) G GKiqinit), t ^ [0,^A^], the uncertainty is increasing as 
a function of time. The way the uncertainty changes along 
q{t) makes it difficult to characterize the exact shape of the 
position uncertainty region. Instead, we use a conservative 
approximation of the region. We define : [0,^Ar] ^ R as 
an approximated uncertainty trajectory and we set ^{t) = ^k, 
t e [{k— l)At^kAt], k= 1, . . . i.e., we set the uncertainty 
along the state trajectory qk{t) equal to the maximum value 
of the uncertainty along qk{t), which is at state qk. An 
example illustrating this idea is given in Fig. [2] 

In this work, we assume that the forward speed is con- 
stant and normalized to 1. The uncertainty model presented 
above can be extended to take into account forward speed 
uncertainty as shown in [FMAG98]. 

VI. Construction of an MDP Model 

A. Satisfying (j) under uncertainty 

Based on the rules presented in Sec. Ill to guarantee that 
a state trajectory q(t) G GK{qinit) satisfies (Eqn. ^) when 
the uncertainty trajectory is (^(^), ^ G [0,^Ar], the following 
conditions need to be satisfied: (i) D{{x{t)^y{t))^t,{t)) C [Tip] 
for some t G [0,/rAr], (ii) D{{x{KA),y{KA)),t,{KA)) C [n^] 
and (iii) D{{x{t),y{t)),t,{t)) H [tt^] = for all t G [0,^A^]. If 
satisfied, these conditions guarantee that a pickup region is 
entered, the end state is inside a drop-off region, and Runsafe 
is not entered along q(t) when the uncertainty trajectory is 




Fig. 2. Above: Evolution of the position uncertainty along the state 
trajectory q{t), where q{t) is partitioned into 3 state trajectories, qkit), 
1,2,3. Below: The conservative approximation of region D((x{t),y{t)),^{t)) 
along the state trajectory q(t) = [x(t),y(t),^(t)]'^ , when the uncertainty 
trajectory is ^{t') = ^k{t), t' G [{k- l)Af,A:Ar], where t,j,{t) ^^]^,k^ 1,2,3. 



Assume q{t) and ^{t) are partitioned into K state 
and uncertainty trajectories, respectively, s.t. qk{t) = q{t') 
and t,k(t) = ^{0, t' e [{k- l)At,kAt], k = 1,...,K. 
Then, the conditions stated above can be written as: 
(i) D{{xk{t),yk{t)),^k{t)) C [Tip] for some t e [{k - 
l)At,kAt] and some k, (ii) D{{xK,yK),^K) ^ and (iii) 
D{{xk{t)Mt)).^k{t)) n [Ttu] = for all t e [{k-l)At,kAt] 
and all k. Thus, by analyzing qk{t) when the uncertainty 
trajectory is ^^{t), k = . . . ^K, can answer if ^(^), when 
the uncertainty trajectory is ^{t), is satisfying. 

Remark 1: Note that the vehicle is subject to cumulative 
and unbounded position uncertainty. Even though there are 
results on how to overcome the cumulative nature of the 



position uncertainty (e.g., see [FMAG98]), in this work we 
assume that qinu and the environment are such that 3K for 
which a satisfying state trajectory exists (if the assumption 
holds, then such a K can always be found by using a 
gyroscope with a finer measurement resolution, which leads 
to a more dense reachability graph). If this assumption is 
violated (e.g., the minimum turn radius is too large to enter 
a pick-up region without entering Runsafe), then there is 
no solution to the problem. For the rest of this paper, we 
use the smallest (the first) K for which a satisfying state 
trajectory can be found. Using the smallest K also reduces 



the computational complexity (see Sec. VII). 



B. MDP construction 

A labeled MDP M that models the motion of the vehicle 
in the environment and the evolution of position uncertainty 
is defined as a tuple (S^so^Act^A^P^U^h) where: 
• 5" is the finite set of states. For every state trajectory 
qk{t) ^ GK^qinit), t e [{k-l)At,kAt], ^= l,...,^, a state of 
the MDP is created. The meaning of the state is as follows: 
{q{t)^q^q^£^£^S) G S means that along the state trajectory 
q{t), the uncertainty trajectory is 



max ||(x,3;),(y,y)||, 



where [x^y^G]^ is the end state of q{t); The noise interval 
is [£,£] e (f; For e 2^: (i) Ttp e 0, (ii) tTj G 0, and (iii) 
G 0, mean that (i) it can be guaranteed that a pick-up 
region is entered, (ii) it can be guaranteed that the end state 
is inside of a drop-off region, and (iii) it is possible to enter 
^unsafe, along the state trajectory q{t) when the uncertainty 
trajectory is ^{t) (see Fig. [s] for an example). Note that ^{t) 
is not an element of a state explicitly since it can be obtained 
from q, q, and q (Eqn. ([6])). 

• '^o = {qinit,qinit,qinit,^X^init) ^ S IS the initial state, where 
^init ^ 2^ is the set of propositions satisfied at qinit- 

• Act = UU(p is the set of actions, where (p is a dummy 
action; 

• A \ S ^ 2^^^ gives the enabled actions at state s\ at 
termination time, A{s) = <p, otherwise A{s) = U\ 

• P : S X Act X S ^ [0, 1] is a transition probability function 
(its construction is described below); 

• 11 = {Tip, Tr^^TTu} is the set of propositions; 

• h \ S ^ 2^ assigns propositions from II to states s e S 
according to the following rule: for s = (^(^),^,^,e,e,0), 
then Tip e h{s) iff Tip G 0, ;rj G h{s) iff % G 0, and Tlu G h{s) 
iff Tlu G 0. 

We generate S and P while building Gk {qinit) starting from 
qinit • Algorithm [T] takes as inputs a state s e S corresponding 
to some state trajectory qk-i{t) and an applied control input 
Uk-\-£k, and generates the new state of the MDP and updates 
S and P. First, given the end state of qk-i{t) and the applied 
control input Uk-\-£k G Wj, the state trajectory at stage k, 
qk{t), is obtained (line 2). Then, using and qk_i, and 
the fact that £k G [£k^^k] ^ we obtain and qj^ (line 3). 
The uncertainty trajectory along qk{t), ^k{t) follows from 
Eqn. ([6]) (line 4). Using the conditions stated in Sec. |VI-A 



Algorithm 1: Generating S and P 



Input: s eS, UkEU, SkE E, S, P 
Output: S, P 

1 {qk-i(t),qj^_^,qk_i,£k_i,£k-uek-l) = 

2 qk(t) = [xk{t),yk{t),ekit)]^=qk(qk-_uUk + £k,ty,_ 

3 q^{t) = q^{q^_^,Uk + £kfy. qki^) =qk{^k-l^Uk + £kA, 

4 (^^(0 =max[^,y^0,]re|^^^j \\{xk.yk)X^'.y')\\\ 

5 if 3? G [(k-\)^t,k^t\ s.t. D{{xk{t),yk{t),^k{t)) ^ [%] then 

6 [_ ©k = ®k^{7^ph 

7 if D{{xk,yk),^k) ^ M then 

8 L % = %U{7r4,; 

9 if 3? G [(/:-l)A^M^] D((x;^(0,jytW,4W)n W ^0 
then 

10 [_ ek = ekU{nu}; 



{qk(t)A^, qk,^k^ ^k, %); P{s, Uk,s') 



the algorithm checks if it can be guaranteed that a pick-up 
region is entered (hnes 5 — 6), that the end state is inside 
of a drop-off region (hnes 7 — 8), and if it is possible to 
enter Runsafe (lines 9—10) along qk{t) when the uncertainty 
trajectory is t,k{t). 

Finally, the newly generated state, s' , is added to 5" and the 
transition probability function is updated, i.e., P{s^Uk-,s') = ^ 
(line 11). This follows from the fact that given a control 
input Uk^U the applied control input will be Uk-\-£k ^ 
with probability ^, since pa){co{uk) = Uk-\- £k) = ^ E 
(see the MDP fragment in Fig. |3]). When constructing a state 
s corresponding to a state trajectory qxit) ^ GK{qinit)^ i-^-. 
when the termination time is reached, we set A{s) = (p with 
P{s,(p,s) = 1. 

Proposition 1: The model M defined above is a valid 
MDP, i.e., it satisfies the Markov property and P is a 
transition probability function. 

Proof: The proof follows from the construction of the MDP. 
Given a current state s E S and an action u E A{s), the 
conditional probability distribution of future states depends 
only on the current state s, not on the sequences of events that 
preceded it (see Alg. [T]). Thus, the Markov property holds. 
In addition, since Y.e^E P(o{(^{i^) = w + e) = 1, it follows that 
P is a valid transition probability function. ■ 

Proposition 2: Let ^O'^'i • • - ^ be the path through the 
MDP corresponding to a state trajectory q{t) e GK{qmit)^ 
t e [O.KAt], i.e., Sk = {qk{t),qj^,qk,£i^,£k,Sk) e S is such 
that qk{t) =q{t'), t' G [{k-\)At,kAt], k=l,...,K. Also, let 
Uk^U he such that qk{t) = qk{qk-\^Uk^ £kit), ^ = 1, . . . 
where £k ^ E, £k E i^k^^k] ^ and qo = qinu- Then, if 
word oqOi . ..Ok, where Oi = h{si), / = 0, 1, . . . satisfies 
PCTL formula 0, the following holds: (i) q{t) when the 
uncertainty trajectory is t,{t), such that ^{t') = t' G 

[{k— l)At,kAt], k= l,...,/r, where ^k{t) is given by Eqn. 
([6]), satisfies 0, and (ii) any state trajectory of the original 
system q\t') =q[{t), t' e [{k- l)At,kAt], k=l,...,K, such 
that q'j^{t) = q'^{q[_^,Uk^£[,t), where £[ G [^ki'^k] ^ ^ and 

^The element under the over-line is repeated infinitely since A(sk) = (p 
and P(sk,(P,sk) = 1. 




Fig. 3. Above: An example scenario corresponding to the MDP fragment 
shown below. For the state trajectory q\{t) = [x{t),y{t),e{t)]'^ , t e [At,2At], 
when the uncertainty trajectory is ^2 (0 ^ ^2 following holds: (i) it 
can be guaranteed that a pick-up region is entered (i.e., 3t e [At,2At] s.t. 
D{{x{t),y{t)),^(t)) C [Kp]), (ii) the end state is inside of a drop-off region 
(i.e., D{{x{2At),y{2At)),^{2At)) C [ttj]), and (iii) Runsafe is not entered 
(i.e., Vr G [At,2At], D{{x{t),y{t)),^{t)) D [ttJ = 0). Thus, 0^ = {7ip,7id}. 
Similarly, ©2 = {7rp,7rj} but ^ {^p^^u}- Below: A fragment of the 
MDP corresponding to the scenario shown above, where [ — ^maxi^max] is 
partitioned into n^3 intervals. Action G A(^) enables three transitions, 
each w.p. ^ . This corresponds to applied control input being equal to ^2 + 4 
w.p. ^, £2 G The elements of the new states are: ^2(0 ^ ^2(^1 '"2 + ^2' 0^ 

^^(0 = ^^(^^,4 + e2'0; ^2(0 = ^2(^i'4+^250' fe2'^2] e ^ is S.t. £'2 e 

[£2, £2]' ®2 i^ given above, where / = 1,2,3. 



^0 = ^init^ satisfies (j). 

Proof: For part (i) the proof follows from the conditions 
stated in Sec. IVI-AI and from the construction of the MDP. 
For part (ii) note that due to the conservative approximation 
of the uncertainty region, q^{t) C D{{x{t),y{t)),^{t)), \/t e 
[O^KAt]. Thus, it follows from (i) that q\t) satisfies (j). ■ 

VII. Vehicle Control Strategy 

A. PCTL control policy generation 

We use the PCTL control synthesis approach from 
[LWABIO] to generate a control pohcy for the MDP M. The 
tool takes as input an MDP and a PCTL formula (j) and 
returns the control policy that maximizes the probability of 
satisfying (j), denoted /i, as well as the corresponding proba- 
bility value, denoted V, where V : S ^ [0^1]. Specifically, for 
s e S, 11 (s) e A{s) is the action to be applied at s and V{s) 
is the probability of satisfying the specification at s under 
control policy /i. The tool is based on the off-the-shelf PCTL 



model-checking tool PRISM (see [KNP04]). We use Matlab 
to construct MDP M, which together with (j) is passed to the 
PCTL control synthesis tool. The computational complexity 
of this step is as follows: Given U, E and K, the size of 
the MDP M is bounded above by {\U\ x |^|)^. The time 
complexity of the control synthesis algorithm is polynomial 
in the size of the MDP and linear in the number of temporal 
operators in the formula. 

B. Obtaining a vehicle control strategy 

Let [wi,wi][w2,W2] . . . [wy^,Wyt] be a sequence of mea- 
sured intervals, where [WpW/] = [ui-\- £i^Ui-\-£i], Ui G U and 
[^p^i] ^ / = 1,...,^. This corresponds to a unique path 

through the MDP: > si > S2...Sk-\ > 

Sk, where each transition is induced by a choice of action 
and the noise interval. The uniqueness follows from the 
construction of MDP M and the fact that the noise interval 
is an element of a state. Given a sequence of measured 
intervals, we define a mapping function in the form of a 
finite sequence ^ = {i/Ai, . . . , \i/k} where y/k : {U x S)^ S, 
s.t. 

for k=l,...,K. 

The desired vehicle control strategy is in the form of a 
finite sequence F = {70, 7i, . . . 7k-i}, where 70 = /i(^o) ^ U 
and Yk'-iUx gf U, s.t. 

for k = 1 , . . . , — 1 . At stage k, the control input is = 
rk-i{{uu[£^,£i]) . . .{uk-u[£k_i,£k-i])) ^U. Thus, given a 
sequence of measured intervals, F returns the control input 
for the next stage by mapping the sequence to the state of 
the MDP; the control input corresponds to the optimal action 
at that state. 

Theorem 1: The probability that the system given by Eqn. 
([T]), under the control strategy F, generates a trajectory that 
satisfies PCTL formula (Eqn. ([2])) is bounded from below 
by y{so), where V{so) is the probability of satisfying on 
the MDP, under the control policy jl. 

Proof: Considering the original system, let Uk^£k, k = 
1 , . . . , ^, be an applied control input at stage k, such that 

Uk = rk-i{{uu[£i,£i])...{uk-u[£k_i,£k-i])) e U and £k G 
[—^maxi ^max] • Then, [t/y^ + Cy^, Uk-\-£k] is the measured interval 
at stage k, such that £k e [£k,£k\ ^ ^- Let q(t), t e [O.KAt], 
be the resulting state trajectory of the original system with 
^(0) =qinit- 

The sequence of measured intervals corresponds to a 
unique path through the MDP, denoted ^-q^i . . . where — 
'^k{{u\\^\M\)--\uk\£jifi]^). k=l,...,K. This sequence 
of states produces word o = oqOi ...ok, where Oi = h{si), 
/ = 0, 1, . . . The produced word can: (i) satisfy (j) and (ii) 
not satisfy (j). Let us first consider the former. 

If the word satisfies (j) from Proposition 2 it follows that 
q{t) also satisfies (j). Under F, the probability of generating 



a state trajectory such that £k e [£k^£k], ^=1,...,^, is 
equivalent to the probability of generating sqSi . . .sk under 
/i. Since under /i the probability that the MDP generates a 
satisfying word is V{so) it follows that the probability that 
the original system under F generates a satisfying trajectory 
is also V{so). 

To show that V (^0) is the lower bound we need to consider 
the latter case. It is sufficient to observe that because of the 
conservative approximation of the uncertainty region it is 
possible that q{t) satisfies (j), even though word o does not 
satisfy it. Therefore, it follows that the probability that the 
original system, under the optimal control startegy F, will 
generate a satisfying trajectory is bounded from below by 
V{so). As a final remark, note that the bound obtained in 
this work approaches the true probability of satisfying (j) in 
the theoretical limit, as Ae ^ 0. ■ 

VIII. Case study 

We considered the system given by Eqn. ([T]) and we used 
the following numerical values: 1/p = 7r/3, At = 1.2, and 
^max = 0-06 with n = 3, i.e., Ae = 0.04. Thus, the maximum 
actuator noise was approximately 6% of the maximum 
control input. 

Three case studies are shown in Fig. |4] The maximum 
probability of satisfying PCTL formula (j) (Eqn. ([2])) on the 
MDPs corresponding to cases A, B and C are 0.981, 0.874 
and 0.892, respectively. For all three case studies, we found 
that K = 6 was enough as a terminal time, and we found 
the vehicle control strategies through the method described 
in Sec. |VII[ To verify our result, we simulated the original 
system under the obtained vehicle control strategies. 

In Fig. |4j we show sample state trajectories and in Table 
1 we compare the satisfaction probabilities obtained on the 
MDP with the simulation based satisfaction probabilities 
(number of satisfying trajectories over the number of gener- 
ated trajectories). The results support Theorem [T] since the 
simulation based probabilities are bounded from below by 
the theoretical probabilities. 

TABLE I 

Theoretical and simulation based probabilities of satisfying 
the specification. 



Case 


Theoretical Probability 

(^(^0)) 


Number of generated trajectories 


10-^ 


1 5- 10-^ II 10^ 


Proba 


3ilities from simulations 


A 


0.981 


1 


1 


1 


B 


0.874 


0.902 


0.926 


0.928 


C 


0.892 


0.913 


0.931 


0.935 



For each case study, the constructed MDP had approxi- 
mately 20000 states. The Matlab code used to construct the 
MDP ran for approximately 4 minutes on a computer with a 
2. 5 GHz dual processor. The control synthesis tool generated 
an optimal policy in about 1 minute. 

IX. Conclusion and future work 

We developed a feedback control strategy for a stochastic 
Dubins vehicle such that the probability of satisfying a 
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Fig. 4. 20 sample state (position) trajectories for cases A, B, and C (to be 
read top-to-bottom). The unsafe, pick-up, and the drop-off regions are shown 
in red, blue and green, respectively. Satisfying and violating trajectories are 
shown in black and red, respectively. In case A, all state trajectories were 
satisfying. 



temporal logic statement over some environmental properties 
is maximized. Through discretization and quantization, we 
translated this problem to finding a control policy maximiz- 
ing the probability of satisfying a PCTL formula on an MDR 
We showed that the probability that the vehicle satisfies 
the specification in the original environment is bounded 
from below by the maximum probability of satisfying the 
specification on the MDR 

Future work includes extensions of this approach to con- 
trolling different types of vehicle models (e.g., stochastic car- 
like vehicles with uncertainties in both the forward speed 



and the turning rate), allowing for richer temporal logic 
specifications, and experimental validations. 

References 

[ADBS08] A. Abate, A. D'Innocenzo, Maria D. Di Benedetto, and 
S. Shankar Sastry. Markov Set-Chains as Abstractions of 
Stochastic Hybrid Systems, volume 1. Springer Verlag, 2008. 

[ASG07] R. Alterovitz, T. Simeon, and K. Goldberg. The stochastic 
motion roadmap: A sampling framework for planning with 
markov motion uncertainty. Proceedings of Robotics: Science 
and Systems, 2007. 

[BKL08] C. Baier, J. R Katoen, and K. M. Larsen. Principles of Model 
Checking. MIT Press, 2008. 

[BKVIO] A. Bhatia, L. E. Kavraki, and M. Y. Vardi. Motion planning 
with hybrid dynamics and temporal goals. In IEEE Conf on 
Decision and Control, pages 1108-1115, Atlanta, OA, 2010. 

[CGP99] E. Clarke, O. Grumberg, and D. A. Peled. Model Checking. 
The MIT Press, 1999. 

[DABS08] A. D'Innocenzo, A. Abate, M. D. Di Benedetto, and S. Shankar 
Sastry. Approximate abstractions of discrete-time controlled 
stochastic hybrid systems. In IEEE Conf. on Decision and 
Control, December 2008. 

[Dub57] L. E. Dubins. On curves of minimal length with a constraint 
on average curvature, and with prescribed initial and terminal 
positions and tangents. American Journal of Mathematics, 
79(3):497-516, 1957. 

[FMAG98] T. Fraichard, R. Mermond, and R. Alpes-Gravir. Path planning 
with uncertainty for car-like robots. In IEEE Conf. on Robotics 
and Automation, pages 27-32, 1998. 

[Gir07] A. Girard. Approximately bisimilar finite abstractions of stable 
linear systems. In Proceedings of the 10th international confer- 
ence on Hybrid systems: computation and control, HSCC'07, 
pages 231-244, Berlin, Heidelberg, 2007. Springer- Verlag. 

[JP09] A. A. Julius and G. J. Pappas. Approximations of stochastic 
hybrid systems. IEEE Transactions on Automatic Control, 
54(6): 1193 -1203, June 2009. 

[KB08a] M. Kloetzer and C. Belta. Dealing with non- determinism in 
symbolic control. In M. Egerstedt and B. Mishra, editors. 
Hybrid Systems: Computation and Control: 11th International 
Workshop, Lecture Notes in Computer Science, pages 287-300. 
Springer Berlin / Heidelberg, 2008. 

[KB08b] M. Kloetzer and C. Belta. A fully automated framework for 
control of linear systems from temporal logic specifications. 
IEEE Transactions on Automatic Control, 2008. 

[KF08] S. Karaman and E. Frazzoli. Vehicle routing problem with 
metric temporal logic specifications. In IEEE Conf. on Decision 
and Control, 2008. 

[KGFP07] H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas. Where's 
waldo? sensor-based temporal logic motion planning. In IEEE 
International Conference on Robotics and Automation, pages 
3116-3121, 2007. 

[KNP04] M. Kwiatkowska, G. Norman, and D. Parker. Probabilistic 
symbolic model checking with PRISM: A hybrid approach. In- 
ternational Journal on Software Tools for Technology Transfer, 
6(2):128-142, 2004. 

[LaV06] S. M. LaValle. Planning Algorithms. Cambridge University 
Press, Cambridge, U.K., 2006. 

[LWABIO] M. Lahijanian, J. Wasniewski, S. B. Andersson, and C. Belta. 

Motion planning and control from temporal logic specifications 
with probabilistic satisfaction guarantees. In Proc. 2010 IEEE 
International Conference on Robotics and Automation, 2010. 

[Maz04] M. Mazo. Robust area coverage using hybrid control. In 
TELEC'04, Santiago de Cuba, Cuba, 2004. 

[SWT09] A. Shkolnik, M. Walter, and R. Tedrake. Reachabihty-guided 
sampling for planning under differential constraints. In IEEE 
International Conference on Robotics and Automation, 2009. 

[TP06] P. Tabuada and G. J. Pappas. Linear time logic control of 
discrete-time linear systems. IEEE Transactions on Automatic 
Control, 2006. 

[YTC+12] B. Yordanov, J. Tumova, I. Cerna, J. Barnat, and C. Belta. Tem- 
poral logic control of discrete-time piecewise affine systems. 
IEEE Transactions on Automatic Control, 2012. to appear. 



